Overview

Dataset Statistics

Number of Variables 5
Number of Rows 7613
Missing Cells 2594
Missing Cells (%) 6.8%
Duplicate Rows 0
Duplicate Rows (%) 0.0%
Total Size in Memory 2.3 MB
Average Row Size in Memory 311.9 B
Variable Types
  • Numerical: 1
  • Categorical: 4

Dataset Insights

id is uniformly distributed Uniform
location has 2533 (33.27%) missing values Missing
keyword has a high cardinality: 221 distinct values High Cardinality
location has a high cardinality: 3341 distinct values High Cardinality
text has a high cardinality: 7503 distinct values High Cardinality
target has constant length 1 Constant Length

Variables


id

numerical

Approximate Distinct Count 7613
Approximate Unique (%) 100.0%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Memory Size 121808
Mean 5441.9348
Minimum 1
Maximum 10873
Zeros 0
Zeros (%) 0.0%
Negatives 0
Negatives (%) 0.0%
  • id is uniformly distributed
  • id is skewed right (γ1 = 0.0076)

Quantile Statistics

Minimum 1
5-th Percentile 548.4
Q1 2734
Median 5408
Q3 8146
95-th Percentile 10356.2
Maximum 10873
Range 10872
IQR 5412

Descriptive Statistics

Mean 5441.9348
Standard Deviation 3137.1161
Variance 9.8415e+06
Sum 4.1429e+07
Skewness 0.007605
Kurtosis -1.1892
Coefficient of Variation 0.5765

keyword

categorical

Approximate Distinct Count 221
Approximate Unique (%) 2.9%
Missing 61
Missing (%) 0.8%
Memory Size 556863

Length

Mean 8.7372
Standard Deviation 3.461
Median 8
Minimum 4
Maximum 21

Sample

1st row ablaze
2nd row ablaze
3rd row ablaze
4th row ablaze
5th row ablaze

Letter

Count 62389
Lowercase Letter 62389
Space Separator 0
Uppercase Letter 0
Dash Punctuation 0
Decimal Number 2396

location

categorical

Approximate Distinct Count 3341
Approximate Unique (%) 65.8%
Missing 2533
Missing (%) 33.3%
Memory Size 404598

Length

Mean 13.6457
Standard Deviation 7.0728
Median 13
Minimum 1
Maximum 49

Sample

1st row Birmingham
2nd row Est. September 201...
3rd row AFRICA
4th row Philadelphia, PA
5th row London, UK

Letter

Count 57531
Lowercase Letter 45347
Space Separator 6168
Uppercase Letter 12184
Dash Punctuation 181
Decimal Number 1218
  • location contains many words: 3248 words

text

categorical

Approximate Distinct Count 7503
Approximate Unique (%) 98.6%
Missing 0
Missing (%) 0.0%
Memory Size 1370085
  • The largest value (11-Year-Old Boy Charged With Manslaughter of Toddler: Report: An 11-year-old boy has been charged with manslaughter over the fatal sh...) is over 1.67 times larger than the second largest value (#Bestnaijamade: 16yr old PKK suicide bomber who detonated bomb in ... http://t.co/KSAwlYuX02 bestnaijamade bestnaijamade bestnaijamade be‰Û_)

Length

Mean 101.0374
Standard Deviation 33.7813
Median 107
Minimum 7
Maximum 157

Sample

1st row Our Deeds are the ...
2nd row Forest fire near L...
3rd row All residents aske...
4th row 13,000 people rece...
5th row Just got sent this...

Letter

Count 592335
Lowercase Letter 517835
Space Separator 106041
Uppercase Letter 74500
Dash Punctuation 1753
Decimal Number 15533
  • text contains many words: 22583 words
  • The largest value (i) is over 2.35 times larger than the second largest value (the)

target

categorical

Approximate Distinct Count 2
Approximate Unique (%) 0.0%
Missing 0
Missing (%) 0.0%
Memory Size 502458

Length

Mean 1
Standard Deviation 0
Median 1
Minimum 1
Maximum 1

Sample

1st row 1
2nd row 1
3rd row 1
4th row 1
5th row 1

Letter

Count 0
Lowercase Letter 0
Space Separator 0
Uppercase Letter 0
Dash Punctuation 0
Decimal Number 7613
  • The top 2 categories (0, 1) take over 50.0%
  • target has words of constant length

Interactions

Correlations

Missing Values